Indication of non-verbal cues within a video communication session

ABSTRACT

Methods and systems provide for indication of non-verbal cues within a video communication session. In one embodiment, a method displays, for each of a number of participants within a video communication session, a user interface including participant windows corresponding to the plurality of participants, and a video for each of at least a subset of the participants, where the video is displayed within the corresponding participant window for the participant. The method analyzes, in real time, the video to detect a non-verbal cue from a participant. The method determines that the non-verbal cue has been sustained for a duration that exceeds a designated threshold of time. The method then displays, within the UI of at least one of the participants, a prompt associated with the non-verbal cue.

FIELD

The present application relates generally to digital communication, andmore particularly, to systems and methods for providing indication ofnon-verbal cues within a video communication session.

SUMMARY

The appended claims may serve as a summary of this application.

BRIEF DESCRIPTION OF THE DRAWINGS

The present application relates generally to digital communication, andmore particularly, to systems and methods providing for dynamicalteration of notification preferences within a video communicationplatform.

The present disclosure will become better understood from the detaileddescription and the drawings, wherein:

FIG. 1A is a diagram illustrating an exemplary environment in which someembodiments may operate.

FIG. 1B is a diagram illustrating an exemplary computer system that mayexecute instructions to perform some of the methods herein.

FIG. 2 is a flow chart illustrating an exemplary method that may beperformed in some embodiments.

FIG. 3A is a diagram illustrating one example embodiment of a UI for avideo communication session with multiple participants, according tosome embodiments.

FIG. 3B is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue to a host or speaking participant,according to some embodiments.

FIG. 3C is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue to a participant associated with thenon-verbal cue, according to some embodiments.

FIG. 3D is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue on a mobile device, according to someembodiments.

FIG. 3E is a diagram illustrating one example embodiment of modifyingthe arrangement of participant windows based on a detected non-verbalcue, according to some embodiments.

FIG. 4 is a diagram illustrating an exemplary computer that may performprocessing in some embodiments.

DETAILED DESCRIPTION OF THE DRAWINGS

In this specification, reference is made in detail to specificembodiments of the invention. Some of the embodiments or their aspectsare illustrated in the drawings.

For clarity in explanation, the invention has been described withreference to specific embodiments, however it should be understood thatthe invention is not limited to the described embodiments. On thecontrary, the invention covers alternatives, modifications, andequivalents as may be included within its scope as defined by any patentclaims. The following embodiments of the invention are set forth withoutany loss of generality to, and without imposing limitations on, theclaimed invention. In the following description, specific details areset forth in order to provide a thorough understanding of the presentinvention. The present invention may be practiced without some or all ofthese specific details. In addition, well known features may not havebeen described in detail to avoid unnecessarily obscuring the invention.

In addition, it should be understood that steps of the exemplary methodsset forth in this exemplary patent can be performed in different ordersthan the order presented in this specification. Furthermore, some stepsof the exemplary methods may be performed in parallel rather than beingperformed sequentially. Also, the steps of the exemplary methods may beperformed in a network environment in which some steps are performed bydifferent computers in the networked environment.

Some embodiments are implemented by a computer system. A computer systemmay include a processor, a memory, and a non-transitorycomputer-readable medium. The memory and non-transitory medium may storeinstructions for performing methods and steps described herein.

Due to the explosive growth and global nature of remote videocommunication, today's global workplace presents more inclusionchallenges than ever before. Participants of video communicationsessions often span multiple cultures with differing culturalexpectations regarding communication. As a result, it is common formeetings to have inequitable participation and engagement, issues leftunresolved, or action items left unclear.

For example, a participant from one culture may not volunteer to speakor engage within the session on his or her own volition, because thecultural expectations of that participant may involve a hierarchicalstructure of communication such that the participant is expected to notinterrupt someone higher in the organizational hierarchy until thatsuperior specifically calls on the participant. The participant mayadditionally or alternatively be waiting for an appropriate pause in theconversation, but feels that there is not a sufficient pause to beginengaging. Instead of directly engaging verbally, the participant mayinstead provide some non-verbal cues that the participant wishes toengage. If the host participant or a currently speaking participant doesnot detect these non-verbal cues and act on them, however, then theparticipant may never engage. Such problems may be exacerbated by thenature of remote video communication, which typically relies onbroadcasted video displayed in participant windows, rather than the moredirect sensory awareness provided by in-person communication.

Thus, there is a need in the field of digital communication tools andplatforms to create new and useful systems and methods for providingindication of non-verbal cues within a video communication session,based on detection of non-verbal cues from participants within the videocommunication session. As participants communicate within a videocommunication session, the system is configured to analyze each video todetect a non-verbal cue from a participant. If the system determinesthat the non-verbal cue has been sustained for a duration that exceeds adesignated threshold of time, then the system can display a prompt to atleast one of the participants. In various embodiments, this prompt canbe a recommendation for the participants making the non-verbal cue toverbally engage, a recommendation for a speaking participant to promptthe participant making the non-verbal cue for input, or any other promptrelating to the non-verbal cue.

In one embodiment, a method displays, for each of a number ofparticipants within a video communication session, a user interface(hereinafter “UI”) including participant windows corresponding to theplurality of participants, and a video for each of at least a subset ofthe participants, where the video is displayed within the correspondingparticipant window for the participant. The method analyzes, in realtime, the video to detect a non-verbal cue from a participant. Themethod determines that the non-verbal cue has been sustained for aduration that exceeds a designated threshold of time. The method thendisplays, within the UI of at least one of the participants, a promptassociated with the non-verbal cue.

Further areas of applicability of the present disclosure will becomeapparent from the remainder of the detailed description, the claims, andthe drawings. The detailed description and specific examples areintended for illustration only and are not intended to limit the scopeof the disclosure.

FIG. 1A is a diagram illustrating an exemplary environment in which someembodiments may operate. In the exemplary environment 100, a clientdevice 150 is connected to a processing engine 102 and, optionally, avideo communication platform 140. The processing engine 102 is connectedto the video communication platform 140, and optionally connected to oneor more repositories and/or databases, including, e.g., a participantrepository 130, non-verbal cue repository 132, and/or a locationrepository 134. One or more of the databases may be combined or splitinto multiple databases. The user's client device 150 in thisenvironment may be a computer, and the video communication platform 140and processing engine 102 may be applications or software hosted on acomputer or multiple computers which are communicatively coupled viaremote server or locally.

The exemplary environment 100 is illustrated with only one clientdevice, one processing engine, and one video communication platform,though in practice there may be more or fewer additional client devices,processing engines, and/or video communication platforms. In someembodiments, the client device(s), processing engine, and/or videocommunication platform may be part of the same computer or device.

In an embodiment, the processing engine 102 may perform the exemplarymethod of FIG. 2 or other method herein and, as a result, provideindication of non-verbal cues within a video communication platform. Insome embodiments, this may be accomplished via communication with theclient device, processing engine, video communication platform, and/orother device(s) over a network between the device(s) and an applicationserver or some other network server. In some embodiments, the processingengine 102 is an application, browser extension, or other piece ofsoftware hosted on a computer or similar device, or is itself a computeror similar device configured to host an application, browser extension,or other piece of software to perform some of the methods andembodiments herein.

The client device 150 is a device with a display configured to presentinformation to a user of the device who is a participant of the videocommunication session. In some embodiments, the client device presentsinformation in the form of a visual UI with multiple selectable UIelements or components. In some embodiments, the client device 150 isconfigured to send and receive signals and/or information to theprocessing engine 102 and/or video communication platform 140. In someembodiments, the client device is a computing device capable of hostingand executing one or more applications or other programs capable ofsending and/or receiving information. In some embodiments, the clientdevice may be a computer desktop or laptop, mobile phone, virtualassistant, virtual reality or augmented reality device, wearable, or anyother suitable device capable of sending and receiving information. Insome embodiments, the processing engine 102 and/or video communicationplatform 140 may be hosted in whole or in part as an application or webservice executed on the client device 150. In some embodiments, one ormore of the video communication platform 140, processing engine 102, andclient device 150 may be the same device. In some embodiments, theuser's client device 150 is associated with a first user account withina video communication platform, and one or more additional clientdevice(s) may be associated with additional user account(s) within thevideo communication platform.

In some embodiments, optional repositories can include one or more of aparticipant repository 130, non-verbal cue repository 132, and/orlocation repository 134. The optional repositories function to storeand/or maintain, respectively, information about participants of a videocommunication session; non-verbal cues to be detected within the videocommunication session; and location information related to participantsalong with, in some embodiments, non-verbal cues and/or culturalexpectations of non-verbal cues which may be associated with thoselocations. The optional database(s) may also store and/or maintain anyother suitable information for the processing engine 102 or videocommunication platform 140 to perform elements of the methods andsystems herein. In some embodiments, the optional database(s) can bequeried by one or more components of system 100 (e.g., by the processingengine 102), and specific stored data in the database(s) can beretrieved.

Video communication platform 140 is a platform configured to facilitatemeetings, presentations (e.g., video presentations) and/or any othercommunication between two or more parties, such as within, e.g., a videoconference or virtual classroom. A video communication session withinthe video communication platform 140 may be, e.g., one-to-many (e.g., aparticipant engaging in video communication with multiple attendees),one-to-one (e.g., two friends remotely communication with one another byvideo), or many-to-many (e.g., multiple participants video conferencingwith each other in a remote group setting).

FIG. 1B is a diagram illustrating an exemplary computer system 150 withsoftware modules that may execute some of the functionality describedherein. In some embodiments, the modules illustrated are components ofthe processing engine 102.

User interface module 152 functions to display, for each of a number ofparticipants within a video communication session, a UI consisting ofparticipant windows with a video for each of at least a subset of theparticipants.

Optional identification module 154 functions to identify a section ofeach video corresponding to the participant and/or one or more physicalfeatures of the participant, such as, e.g., eyes.

Analysis module 156 functions to analyze, in real time, each video todetect a non-verbal cue from the participant on a subject.

Threshold determination module 158 functions to determine that thenon-verbal cue has been sustained for a duration that exceeds adesignated threshold of time.

Prompt module 160 functions to display, within the UI of at least one ofthe participants, a prompt associated with the non-verbal cue.

Optional location module 162 functions to identify a geographicallocation for at least one of the participants, determine one or morenon-verbal cues and/or cultural expectations of non-verbal cues based onthe geographical location of the participant(s), and customize theprompt based on the determination(s).

The above modules and their functions will be described in furtherdetail in relation to an exemplary method below.

FIG. 2 is a flow chart illustrating an exemplary method that may beperformed in some embodiments.

At step 210, the system displays a UI for a video communication session.The UI is displayed for each of a number of participants within thevideo communication session. The UI includes at least a number ofparticipant windows corresponding to the number of participants, and avideo for each of at least a subset of the participants. The video for aparticipant is displayed within the corresponding participant window forthat participant.

In some embodiments, the system connects participants to a livecommunication stream via their respective client devices. Thecommunication stream may be any “session” (such as an instance of avideo conference, webinar, informal chat session, or any other suitablesession) initiated and hosted via the video communication platform, forremotely communicating with one or more users of the video communicationplatform, i.e., participants within the video communication session.Participants are connected on user devices, and are associated with useraccounts within the communication platform.

The UI for the video communication session is displayed on the clientdevice of each participant. In some embodiments, the UI appearsdifferent for different participants, or has different UI elementsincluded for different participants depending on their user permissions,access levels (e.g., a premium-tier business user account as compared toa free-tier user account), or other aspects that may differentiate oneparticipant from another within the video communication platform. Invarious embodiments, the UI is configured to allow the participant to,e.g., navigate within the video communication session, engage orinteract with one or more functional elements within the videocommunication session, control one or more aspects of the videocommunication session, and/or configure one or more settings orpreferences within the video communication session.

In some embodiments, the system receives a number of video feedsdepicting imagery of a number of participants, the video feeds eachhaving multiple video frames. In some embodiments, the video feeds areeach generated via an external device, such as, e.g., a video camera ora smartphone with a built-in video camera, and then the video content istransmitted to the system. In some embodiments, the video content isgenerated within the system, such as on a participant's client device.For example, a participant may be using her smartphone to record videoof herself giving a lecture. The video can be generated on thesmartphone and then transmitted to the processing system, a local orremote repository, or some other location. In some embodiments, one ormore of the video feeds are pre-recorded and are retrieved from local orremote repositories. In various embodiments, the video content can bestreaming or broadcasted content, pre-recorded video content, or anyother suitable form of video content. The video feeds each have multiplevideo frames, each of which may be individually or collectivelyprocessed by the processing engine of the system.

In some embodiments, the video feeds are received from one or more videocameras connected to a client device associated with each participant.Thus, for example, rather than using a camera built into the clientdevice, an external camera can be used which transmits video to theclient device, or some combination of both.

In some embodiments, the participants are users of a video communicationplatform, and are connected remotely within a virtual communication roomgenerated by the communication platform. This virtual communication roommay be, e.g., a virtual classroom or lecture hall, a group room, abreakout room for subgroups of a larger group, or any other suitablecommunication room which can be presented within a communicationplatform. In some embodiments, synchronous or asynchronous messaging maybe included within the communication session, such that the participantsare able to textually “chat with” (i.e., sends messages back and forthbetween) one another in real time.

In some embodiments, the UI includes a number of selectable UI elements.For example, one UI may present selectable UI elements along the bottomof a communication session window, with the UI elements representingoptions the participant can enable or disable within the video session,settings to configure, and more. For example, UI elements may be presentfor, e.g., muting or unmuting audio, stopping or starting video of theparticipant, sharing the participant's screen with other participants,recording the video session, and/or ending the video session.

At least a portion of the UI displays a number of participant windows.The participant windows correspond to the multiple participants in thevideo communication session. Each participant is connected to the videocommunication session via a client device. In some embodiments, theparticipant window may include video, such as, e.g., video of theparticipant or some representation of the participant, a room theparticipant is in or virtual background, and/or some other visuals theparticipant may wish to share (e.g., a document, image, animation, orother visuals). In some embodiments, the participant's name (e.g., realname or chosen username) may appear in the participant window as well.One or more participant windows may be hidden within the UI, andselectable to be displayed at the user's discretion. Variousconfigurations of the participant windows may be selectable by the user(e.g., a square grid of participant windows, a line of participantwindows, or a single participant window). In some embodiments, theparticipant windows are arranged in a specific way according to one ormore criteria, such as, e.g., current or most recent verbalparticipation, host status, level of engagement, and any other suitablecriteria for arranging participant windows. Some participant windows maynot contain any video, for example, if a participant has disabled videoor does not have a connected video camera device (e.g. a built-in camerawithin a computer or smartphone, or an external camera device connectedto a computer).

In some embodiments, at optional step 212, the system identifies one ormore features of participants, such as, e.g., the eyes of participants.In some embodiments, the system identifies a section of each videocorresponding to one or more specific physical features of aparticipant. In some embodiments, the physical feature(s) of theparticipant are detected via one or more video processing and/oranalysis techniques. In some embodiments, the detection of theparticipant's features may be performed by one or more ArtificialIntelligence (AI) engines. Such AI engine(s) may be configured toperform aspects or techniques associated with, e.g., machine learning,neural networks, deep learning, computer vision, or any other suitableAI aspects or techniques. In some embodiments, such AI engine(s) may betrained on a multitude of differing images of previous participantimagery and/or features appearing within video content, as well asimages where participant imagery and/or features do not appear withinvideo content. In some embodiments, participant imagery and/or physicalfeatures may be labeled within at least some of the training data. Insome embodiments, the AI engine(s) are trained to classify, within acertain range of confidence, whether a user appears or does not appearwithin a given piece of video content.

In some embodiments, the system detects a face region or eye regionwithin the video content. In some embodiments, as in previous steps, thesystem may detect the face region or eye region using one or moreaspects or techniques of AI engine(s). For example, in some embodimentsa deep learning model may be used for face and/or eye detection. Such adeep learning model may be trained based on, e.g., a multitude of imagesof users' faces and/or eyes within cropped and/or uncropped images fromvideo content. In some embodiments, one or more facial or eyerecognition algorithms are used. In some embodiments, feature-basedmethods may be employed. In some embodiments, statistical tools forgeometry-based or template-based face or eye recognition may be used,such as, e.g., Support Vector Machines (SVM), Principal ComponentAnalysis (PCA), Linear Discriminant Analysis (LDA), Kernel methods orTrace Transforms. Such methods may analyze local facial features andtheir geometric relationships. In some embodiments, techniques oraspects may be piecemeal, appearance-based, model-based, templatematching-based, or any other suitable techniques or aspects fordetecting a face and/or eye region.

At step 214, the system detects a non-verbal cue from a participant. Inparticular, the system analyzes, in real time, each video to detect anon-verbal cue from a participant on a subject.

A non-verbal cue may include any movement, gesture, or behavior of aparticipant reflected within the video of that participant in a videosession. Non-verbal cues may include, e.g., postures (e.g., standing upstraight, leaning forward, repositioning in one's chair, stretching,shoulder height), facial expressions (e.g., a smile, a smirk, a confusedexpression, or a shocked expression), aspects of eye gaze (e.g.,sustained eye contact, eye rolling, widened eyes), gestures (e.g.,raising a hand, pointing at a subject, or shrugging shoulders), andparalinguistic features (e.g., tone of voice or affect, loudness), bodylanguage, and proxemics or use of personal space.

In some embodiments, as in previous steps, the system may detect thenon-verbal cue using one or more aspects or techniques of AI engine(s).For example, in some embodiments a deep learning model may be used fordetection and classification of participant movements, facialexpressions, or any other suitable indication of a non-verbal cue. Sucha deep learning model may be trained based on, e.g., a multitude ofstill frames and/or video of users' physical movements, gestures, facialexpressions, and/or other suitable data from video content. The trainingdata may include still frames and/or video from a number of prior videocommunication sessions, either with one or more of the participants fromthis video session or with none of the participants from this videosession.

In some embodiments, the system is configured to detect and classify thenon-verbal cue from a list of pre-designated non-verbal cues. Such alist of non-verbal cues may be the same for any participant. In someembodiments, the system identifies a geographical location of aparticipant broadcasting video, then detects and classifies thenon-verbal cue from a list of pre-designated non-verbal cues for thatspecific geographical location. In this way, the non-verbal cues whichmay be considered to be detected for a given participant can be informedby the, e.g., country, region, or other geographical location theparticipant resides in. In some embodiments, this geographical dataabout the participant can be determined based on, e.g., IP address ofthe participant, data from the client device the participant is using(such as, for example, GPS data), data from a user profile or usersettings or preferences within the video communication platform, or anyother form of obtaining a geographical location of the participant. Insome embodiments, rather than identifying a current geographicallocation, the system can identify a country or region of origin. In someembodiments this may be self-identified by the participant, retrievedfrom existing participant information from the video communicationplatform, may be determined as a prediction from the system, or anyother suitable method. In some embodiments, a combination of currentgeographical location and country or region of origin may be used toclassify a non-verbal cue from a list of non-verbal cues that representsa superset of the sets of non-verbal cues from the current geographicallocation and the country or region of origin.

In some embodiments where a pre-defined list of non-verbal cues are usedin the detection of the non-verbal cue from the participant, each of thenon-verbal cues includes a pre-defined signature for detection of thenon-verbal cue. In some embodiments, the system first builds an AI modelfor the detection of such signatures from the list of non-verbal cues,trains the model using training data which includes video content withvarious non-verbal cues, and then uses the model to detect thenon-verbal cue from the participant. In some embodiments, this AI modelis combined with a behavioral model which uses a behavioral profile ofthe participant to determine which non-verbal cues and signatures tolook for. In some cases, the system builds this behavioral model for theparticipant using AI-learned non-verbal cues and signatures from theparticipant. In some embodiments, this behavioral model may learnnon-verbal cues and signatures which the system was previously unawareof, or which may be unique to the participant. In some embodiments, thesystem may additionally or alternatively use non-auditory feedbacksystems to build an engagement model for detecting engagement or lackthereof within the participant, and for increasing engagement in theparticipant using one or more additional models such as, e.g., abehavioral model or non-verbal cue detection model.

In some embodiments, non-verbal cues may include expressions ofdisinterest or unwillingness to participate. Such non-verbal cues maybe, for example, slouching in one's seat, leaning backwards, lack of eyecontact for a fixed period of time, or other similar cues which maysignal lack of interest in participating in some cultures. In someembodiments, such cues are the inverse of cues suggesting that theparticipant wishes to participant, and may be used for prompts and/orfor recommending actions based on that disinterest.

In some embodiments, the non-verbal cue is eye contact. The systemidentifies a section of each video corresponding to the eyes of theparticipant, then analyzes, in real time, each video to detect anysustained eye contact from a participant on a subject. The eye contactmay be sustained for a duration that exceeds a designated threshold oftime, as will be described with respect to step 216. In variousembodiments, the subject may be any subject on camera (i.e., in the roomthe participant is in), any subject corresponding to the screen of theclient device the participant is using, the camera itself, or any othersuitable subject. In some embodiments, the system identifies ageographical location of the participant making the sustained eyecontact, and the detection of the eye contact is based on non-verbalcues associated with that geographical location.

In some embodiments, the system receiving a behavioral profile for theparticipant associated with the non-verbal cue. In varying embodiments,the behavioral profile may be an existing profile related to aparticular participant and/or their user account withing the videocommunication platform. The behavioral profile may include a number ofbehaviors associated with that participant with respect to the videocommunication platform, such as, e.g., preferences for video sessions,previous selections of options within the video communication platform,routines or habits detected by the video communication platform withrespect to video sessions or use of the client device, detected metricsof user engagement for past and/or current video sessions, or any othersuitable behaviors within the video communication platform. In someembodiments, the system determines one or more non-verbal cuesassociated with the behavioral profile for the participant. In someembodiments, this may include determining a set of non-verbal cues fromthe listed behaviors from the behavioral profile. In some embodiments,the prompt associated with the non-verbal cue is customized based on thedetermined non-verbal cues and behavioral profile. For example, theprompt can include aspects of the user's behavioral with respect tonon-verbal cues in order to personalize the prompt to that user andtheir learned non-verbal behaviors.

In some embodiments, the system receives behavioral profiles for one ormore of a host participant, and a currently speaking participant. Thesystem determines one or more expectations of non-verbal cues associatedwith the behavioral profiles for the participants. The prompt associatedwith the non-verbal cue is then customized based on the determinedexpectations of non-verbal cues and the behavioral profiles. Forexample, the prompt can include aspects of the participant's behaviorand expectations around non-verbal cues to personalize the prompt tothat user.

At step 216, the system determines that the non-verbal cue has beensustained for a duration that exceeds a designated threshold of time.

In some embodiments, the designated threshold of time corresponds to thespecific non-verbal cue that has been detected. For example, if thenon-verbal cue that has been detected in step 214 is eye contact (e.g.,eye contact with the camera, with another person in the room, or withthe area of their screen or UI that corresponds to a given participant),then a designated threshold of time may be retrieved for sustaining eyecontact with that subject. If the amount of time the participantmaintains eye contact with that subject exceeds this designatedthreshold, then the system determines that the threshold has beensatisfied. In some embodiments, the designated threshold for a givennon-verbal cue may be modified depending on the specific geographicallocation and/or country or region of origin associated with theparticipant. For example, participants from some countries may sustaineye contact for a longer period than others to indicate that theparticipant wishes to interject in a discussion, depending on varyingcultural expectations around cultural cues.

At step 218, the system displays, within the UI of at least one of theparticipants, a prompt associated with the non-verbal cue. In varyingembodiments, the prompts may be a notification (e.g., a pushnotification or pop-up notification), recommendation, message (e.g., achat message or an SMS message), or any other suitable prompt. Forexample, the prompt may be displayed within, e.g., a chat section of thevideo communication UI, a pop-up toast notification, or within aseparate mobile phone application.

In some embodiments, the participant associated with the non-verbal cueis prompted, i.e., the prompt associated with the non-verbal cue isdisplayed to the participant in the video associated with the non-verbalcue. In some embodiments, the prompt to the participants consists of arecommendation to verbally engage with one or more participants withinthe video communication session. For example, the prompt may read, “Youappear to have something to say. Raise your hand so the speaker knows!”,or “It looks like you want to speak up. You may do so now.” In someembodiments, the prompt may appear in a specific time interval where apause in the conversation has occurred, or when a pause is anticipatedby the system to occur soon based on the patterns of speech of currentlyspeaking participant(s).

In some embodiments, the prompt to the participant may be arecommendation to engage with one or more UI components indicating anintent to verbally engage. For example, there may be a UI componentmarked “raise your hand” which, when engaged by the participant at theclient device, signals that the participant is virtually raising theirhand as if to signal that they have something to say, and would like tobe called upon by the host participant of the session or a currentlyspeaking participant. The prompt may recommend to the participant thathe or she should use the included “raise your hand” feature in one clickin order to signal non-verbal indication of a desire to participate inthe discussion. The host or speaking participant, in turn, may see theindication that the participant has virtually raised their hand, and maycede the floor to the participant for their input. Many other such UIcomponents and prompts may be contemplated.

In some embodiments, the system identifies a geographical locationand/or country or region of origin for the participant associated withthe non-verbal cue. The detection of the non-verbal cue in step 214includes identifying a non-verbal cue from a list of non-verbal cuesassociated with that geographical location, country or region of origin,or both. In some embodiments, the prompt associated with the non-verbalcue can then be customized for that participant based on the determinednon-verbal cues.

In varying embodiments, the prompt associated with the non-verbal cue isdisplayed to a host participant of the video communication session, acurrently speaking participant of the video communication session, orboth. For example, the host may be prompted by a message indicating thatthe participant in question is likely to have additional insights. Forexample, the prompt may read, “Naoto appears to be highly engaged.Prompt them for input!” The host may see the prompt and then invite theparticipant for their input in the discussion. In some embodiments, theprompt includes a notification that the participant associated with thenon-verbal cue intends to verbally engage, rather than, e.g., arecommendation to prompt the participant for input. In some embodiments,the prompt may include a reminder of cultural differences betweenparticipants, and may, in some cases, include specific information aboutgeographical differences and/or the cultural expectations forcommunications associated with them.

In some embodiments, for each participant for which the prompt isdisplayed, the system identifies a geographical location and/or countryor region of origin associated with that participant, then determinesone or more cultural expectations of non-verbal cues based on thegeographical location. The prompt displayed for the participant is thencustomized based on the one or more determined cultural expectations forthat geographical location and/or country or region of origin associatedwith that participant. In some embodiments, the cultural expectations ofnon-verbal cues may be pre-designated based on that specificgeographical location, country or region of origin, or some combinationthereof. Cultural expectations of non-verbal cues may include one ormore pieces of information about non-verbal cues, including, e.g., alist of non-verbal cues (e.g., a head shake, eye contact, rubbing one'sneck, raised eyebrows, or any other suitable non-verbal cues),expectation duration for that non-verbal cue to be exhibited in orderfor the cue to indicate something (e.g., 5 seconds or longer), what thenon-verbal cue indicates (e.g., sustained eye contact may indicate thatthe subject wishes to add something to the discussion or ask aquestion), and any other information which may be suitable. In varyingembodiments, cultural expectations may additionally or alternativeinclude or be modified by, e.g., established social norms,organizational norms, relational norms, situational factors and context,personality characteristics, and level of familiarity with otherparticipants.

In some embodiments, the system may detect or receive indication of atermination of the video communication session. Once the videocommunication session has terminated, the system may be configured todisplay one or more metrics to at least one participant. For example,the system may send a report including various metrics to a hostparticipant or speaking participant of the video communication session.The system may additionally or alternatively display a dashboardincluding various metrics, send an email, provide a notification or chatmessage, or otherwise send or display the information. In someembodiments, a visual indication of one or more metrics may bepresented, such as, e.g., a histogram based on content.

Such metrics presented by the system may include, in variousembodiments, at least one of: determined participant engagement,detected non-verbal cues, identified geographical locations ofparticipants, and determined cultural expectations of participantsregarding non-verbal cues during the video communication session. Forexample, a report may show that listening participants were highlyengaged during a specific window of time during which a speaker waspresenting. In some embodiments, highlights of the video communicationsession when participants were highly engaged may be recorded and madeaccessible for playback by one or more participants. Engagement metricsmay be determined by the system after the video communication hasterminated, based on detection of non-verbal cues from participants. Insome embodiments, the system may provide information on how oftenparticipants are in video communication sessions that have a similar setof cultural expectations around communication and non-verbal cues, orhow often or when participants are in meetings that resonate with theircommunication style compared to how often or when they are in meetingsthat do not resonate with their communication style. In someembodiments, the system may provide recommendations, such as recommendedactions or modifications to communication style, which may result inmore engagement with participants with certain cultural expectations orcommunication styles. In some embodiments, the system may present aparticipant with information on other specific participants andobservations the system has detected about them during the videocommunication session. For example, the system may provide anotification to a speaking participant that a particular user hadsomething to say for 15 minutes but did not interject into thediscussion. Such information may lead speakers to modify their behaviorto more proactively include or make conversational space for particularparticipants during discussion. Many other such metrics, notifications,and/or recommendations may be contemplated.

FIG. 3A is a diagram illustrating one example embodiment of a UI for avideo communication session with multiple participants, according tosome embodiments.

User interface 300 depicts a UI that a particular participant is viewingon a screen of the participant's client device. Four participant windowsare displayed within the UI, arranged in a 4×4 grid. Within eachparticipant window is a video. The video in each of the participantwindows is a live video feed captured via a camera or other device thatis either built into or connected to the client device of thatparticipant, then streamed to the UIs of participants. Also appearing inthe bottom left corner of each participant window is a name of theparticipant, as well as an icon indicating that the participant hastheir audio muted, if applicable. In the top right, a selectable UIelement allows a participant to toggle between a full screen view andnon-full-screen view. To the right, a chat or messaging section of theUI provides participants to enter messages to be displayed while thevideo communication session proceeds.

A bar at the bottom of the UI presents a number of selectable UIelements within the UI. These elements include Mute, Stop Video,Security, Participants, Chat, Share Screen, Polling, Record, ClosedCaption, Reactions, More, and End.

Within this example embodiment, participant 302 is currently in theprocess of speaking on a subject of discussion. Participant 304 is thehost of the video communication session, and is currently silentlylistening to the speaking participant 302. Participant 306 is currentlysilently listening to the speaking participant 302, and is alsocurrently maintaining sustained eye contact with the camera.

FIG. 3B is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue to a host or speaking participant,according to some embodiments.

In the example of FIG. 3B, the UI and participants from FIG. 3A arepresent. The UI in questions belongs to speaking participant 302, who isviewing the UI on the client device's screen while speaking. Whilespeaking, a prompt 310 appears in the chat section of the UI, reading,“Naoto appears to be highly engaged. Prompt them for input!” Naoto isparticipant 306, who is silently listening while maintaining sustainedeye contact with the camera. Naoto is signaling with a non-verbal cuethat she would like to interject with a question or comment. The systemhas analyzed the video to detect possible non-verbal cues from theparticipants. For Naoto, the system classifies potential non-verbal cuesfrom among a list of non-verbal cues associated with Naoto'sgeographical location, which is different from speaking participant302's location and host participant 304's location. The systemclassifies Naoto as having a non-verbal cue of sustained eye contact ona subject. The system further determines that the sustained eye contacthas been maintained past a designated duration threshold of 5 seconds,which has been designated within the platform to be the threshold of eyecontact in Naoto's geographical location to indicate that a listenerlikely wishes to interject. Thus, the system takes the next step ofdisplaying a prompt to the speaking participant 302 recommending thatthe speaking participant prompt the participant 306 for input on thesubject.

FIG. 3C is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue to a participant associated with thenon-verbal cue, according to some embodiments.

The example in FIG. 3C is similar to the example in FIG. 3B, except thatthe UI now belongs to the participant 306 who is exhibiting a non-verbalcue by maintaining fixed eye contact. On the UI of this participant whois exhibiting the non-verbal cue, the system has detected the non-verbalcue as discussed with respect to FIG. 3C. The system displays a promptto the participant within the chat section of the UI, reading, “Youappear to have something to say. Raise your hand so the speaker knows!”Upon reading the prompt, the participant may physically raise theirhand, indicating to the speaking participant 302 and/or host participant304 that the participant wishes to interject and engage in discussion.In some embodiments, the system may additionally determine that thegeographical location of the speaking participant 302 and/or hostparticipant 304 has a cultural expectation that raising one's handindicates that one wishes to interject to add a comment or ask aquestion, and customizes the prompt accordingly to recommend to theparticipant that they should raise their hand.

FIG. 3D is a diagram illustrating one example embodiment of displaying aprompt about a non-verbal cue on a mobile device, according to someembodiments. Within the example embodiment of FIG. 3D, the scenario ofFIG. 3B has occurred, but instead of the prompt being displayed at thechat section of the video communication UI, the speaking participantreceives a prompt 322 as a notification displayed on a mobile device320, e.g., a smartphone belonging to the user. In some embodiments, thenotification may appear, for example, on a lock screen of the mobiledevice, and may additionally or alternatively appear within anapplication for the video communication platform. In some embodiments,the participant may receive a vibration, notification sound, or otheralert or signal indicating that a notification has occurred.

FIG. 3E is a diagram illustrating one example embodiment of modifyingthe arrangement of participant windows based on the detected non-verbalcue, according to some embodiments. Within the example embodiment, thesystem has detected a non-verbal cue from participant 306. The systemthen responds by altering the arrangement of the grid of participantwindows to reflect participant 306's increased engagement and indicateddesire to participate. Rather than the participant 306's windowappearing in the top right portion of the grid, her window appears inthe top left portion of the grid, a higher placement reflecting theincreased engagement. In this respect, the grid may be adjusted in realtime according to detected non-verbal cues from participants.

FIG. 4 is a diagram illustrating an exemplary computer that may performprocessing in some embodiments. Exemplary computer 400 may performoperations consistent with some embodiments. The architecture ofcomputer 400 is exemplary. Computers can be implemented in a variety ofother ways. A wide variety of computers can be used in accordance withthe embodiments herein.

Processor 401 may perform computing functions such as running computerprograms. The volatile memory 402 may provide temporary storage of datafor the processor 401. RAM is one kind of volatile memory. Volatilememory typically requires power to maintain its stored information.Storage 403 provides computer storage for data, instructions, and/orarbitrary information. Non-volatile memory, which can preserve data evenwhen not powered and including disks and flash memory, is an example ofstorage. Storage 403 may be organized as a file system, database, or inother ways. Data, instructions, and information may be loaded fromstorage 403 into volatile memory 402 for processing by the processor401.

The computer 400 may include peripherals 405. Peripherals 405 mayinclude input peripherals such as a keyboard, mouse, trackball, videocamera, microphone, and other input devices. Peripherals 405 may alsoinclude output devices such as a display. Peripherals 405 may includeremovable media devices such as CD-R and DVD-R recorders/players.Communications device 406 may connect the computer 100 to an externalmedium. For example, communications device 406 may take the form of anetwork adapter that provides communications to a network. A computer400 may also include a variety of other devices 404. The variouscomponents of the computer 400 may be connected by a connection mediumsuch as a bus, crossbar, or network.

It will be appreciated that the present disclosure may include any oneand up to all of the following examples.

Example 1. A method comprising: displaying, for each of a number ofparticipants within a video communication session, a UI comprising: anumber of participant windows corresponding to the number ofparticipants, and a video for each of at least a subset of theparticipants, wherein the video is displayed within the correspondingparticipant window for the participant; analyzing, in real time, eachvideo to detect a non-verbal cue from a participant; determining thatthe non-verbal cue has been sustained for a duration that exceeds adesignated threshold of time; and displaying, within the UI of at leastone of the participants, a prompt associated with the non-verbal cue.

Example 2. The method of Example 1, wherein analyzing each video todetect the non-verbal cue from the participant comprises: identifying asection of each video corresponding to the eyes of the participant; andanalyzing, in real time, each video to detect sustained eye contact froma participant on a subject.

Example 3. The method of any of Examples 1-2, wherein the promptassociated with the non-verbal cue is displayed to the participant inthe video associated with the non-verbal cue.

Example 4. The method of any of Examples 1-3, wherein the promptdisplayed to the participant comprises one or more of: a recommendationto verbally engage with one or more participants within the videocommunication session, and a recommendation to engage with one or moreUI components indicating an intent to verbally engage.

Example 5. The method of any of Examples 1-4, wherein the promptassociated with the non-verbal cue is displayed to one or more of: ahost participant of the video communication session, and a currentlyspeaking participant of the video communication session.

Example 6. The method of any of Examples 1-5, further comprising: foreach participant for which the prompt is displayed: identifying ageographical location associated with that participant, and determiningone or more cultural expectations of non-verbal cues based on thegeographical location for the participant, the prompt displayed for theparticipant is customized based on the one or more determined culturalexpectations.

Example 7. The method of any of Examples 1-6, wherein the promptdisplayed to the at least one participant comprises a notification thatthe participant in the video associated with the non-verbal cue intendsto verbally engage.

Example 8. The method of any of Examples 1-7, wherein the promptdisplayed to the at least one participant comprises a recommendation forthe at least one participant to prompt the participant associated withthe non-verbal cue for verbal input within the video communicationsession.

Example 9. The method of any of Examples 1-8, further comprising:identifying a geographical location of the participant associated withthe non-verbal cue, the detecting of the non-verbal cue comprisesidentifying a non-verbal cue from a list of non-verbal cues associatedwith the geographical location.

Example 10. The method of any of Examples 1-9, wherein the designatedthreshold of time is modified based on the geographical location of theparticipant associated with the non-verbal cue.

Example 11. The method of any of Examples 1-10, wherein one or moreaspects of detecting non-verbal cue on a subject within the video forone of the participants is performed by an AI algorithm.

Example 12. The method of Example 11, wherein the AI algorithm istrained on a plurality of prior video communication sessions.

Example 13. The method of any of Examples 1-12, further comprising:modifying a preexisting arrangement of the participant windows withinthe UT to place the participant associated with the non-verbal cue in amore prominent or higher position.

Example 14. The method of any of Examples 1-13, further comprising:receiving a behavioral profile for the participant associated with thenon-verbal cue; and determining one or more non-verbal cues associatedwith the behavioral profile for the participant, the prompt associatedwith the non-verbal cue is customized based on the determined non-verbalcues and behavioral profile.

Example 15. The method of any of Examples 1-14, further comprising:receiving behavioral profiles for one or more of: a host participant,and a currently speaking participant; determining one or moreexpectations of non-verbal cues associated with the behavioral profilesfor the participants, the prompt associated with the non-verbal cue iscustomized based on the determined expectations of non-verbal cues andthe behavioral profiles.

Example 16. The method of any of Examples 1-15, further comprising:detecting termination of the video communication session; anddisplaying, at a client device of at least one participant, one or moremetrics relating to at least one of: determined participant engagement,detected non-verbal cues, identified geographical locations ofparticipants, and determined cultural expectations of participantsregarding non-verbal cues during the video communication session.

Example 17. The method of any of Examples 1-16, further comprising:detecting termination of the video communication session; and providing,at a client device of at least one participant, playback access to oneor more recorded highlights of the video communication session, thehighlights are determined based on detected non-verbal cues associatedwith participant engagement.

Example 18. A non-transitory computer-readable medium containinginstructions, comprising: instructions for displaying, for each of anumber of participants within a video communication session, a UIcomprising: a number of participant windows corresponding to the numberof participants, and a video for each of at least a subset of theparticipants, wherein the video is displayed within the correspondingparticipant window for the participant; instructions for analyzing, inreal time, each video to detect a non-verbal cue from a participant;instructions for determining that the non-verbal cue has been sustainedfor a duration that exceeds a designated threshold of time; andinstructions for displaying, within the UI of at least one of theparticipants, a prompt associated with the non-verbal cue.

Example 19. The non-transitory computer-readable medium of Example 18,wherein analyzing each video to detect the non-verbal cue from theparticipant comprises: identifying a section of each video correspondingto the eyes of the participant; and analyzing, in real time, each videoto detect sustained eye contact from a participant on a subject.

Example 20. The non-transitory computer-readable medium of any ofExamples 18-19, wherein the prompt associated with the non-verbal cue isdisplayed to the participant in the video associated with the non-verbalcue.

Example 21. The non-transitory computer-readable medium of any ofExamples 18-20, wherein the prompt displayed to the participantcomprises one or more of: a recommendation to verbally engage with oneor more participants within the video communication session, and arecommendation to engage with one or more UI components indicating anintent to verbally engage.

Example 22. The non-transitory computer-readable medium of any ofExamples 18-21, wherein the prompt associated with the non-verbal cue isdisplayed to one or more of: a host participant of the videocommunication session, and a currently speaking participant of the videocommunication session.

Example 23. The non-transitory computer-readable medium of any ofExamples 18-22, further comprising: for each participant for which theprompt is displayed: instructions for identifying a geographicallocation associated with that participant, and instructions fordetermining one or more cultural expectations of non-verbal cues basedon the geographical location for the participant, the prompt displayedfor the participant is customized based on the one or more determinedcultural expectations.

Example 24. The non-transitory computer-readable medium of any ofExamples 18-23, wherein the prompt displayed to the at least oneparticipant comprises a notification that the participant in the videoassociated with the non-verbal cue intends to verbally engage.

Example 25. The non-transitory computer-readable medium of any ofExamples 18-24, wherein the prompt displayed to the at least oneparticipant comprises a recommendation for the at least one participantto prompt the participant associated with the non-verbal cue for verbalinput within the video communication session.

Example 26. The non-transitory computer-readable medium of any ofExamples 18-25, further comprising: instructions for identifying ageographical location of the participant associated with the non-verbalcue, the detecting of the non-verbal cue comprises identifying anon-verbal cue from a list of non-verbal cues associated with thegeographical location.

Example 27. The non-transitory computer-readable medium of any ofExamples 18-26, wherein the designated threshold of time is modifiedbased on the geographical location of the participant associated withthe non-verbal cue.

Example 28. The non-transitory computer-readable medium of any ofExamples 18-27, wherein one or more aspects of detecting non-verbal cueon a subject within the video for one of the participants is performedby an AI algorithm.

Example 29. The non-transitory computer-readable medium of Example 28,wherein the AI algorithm is trained on a plurality of prior videocommunication sessions.

Example 30. The non-transitory computer-readable medium of any ofExamples 18-29, further comprising: instructions for modifying apreexisting arrangement of the participant windows within the UI toplace the participant associated with the non-verbal cue in a moreprominent or higher position.

Example 31. The non-transitory computer-readable medium of any ofExamples 18-30, further comprising: instructions for receiving abehavioral profile for the participant associated with the non-verbalcue; and instructions for determining one or more non-verbal cuesassociated with the behavioral profile for the participant, the promptassociated with the non-verbal cue is customized based on the determinednon-verbal cues and behavioral profile.

Example 32. The non-transitory computer-readable medium of any ofExamples 18-31, further comprising: instructions for receivingbehavioral profiles for one or more of: a host participant, and acurrently speaking participant; instructions for determining one or moreexpectations of non-verbal cues associated with the behavioral profilesfor the participants, the prompt associated with the non-verbal cue iscustomized based on the determined expectations of non-verbal cues andthe behavioral profiles.

Example 33. The non-transitory computer-readable medium of any ofExamples 18-32, further comprising: instructions for detectingtermination of the video communication session; and displaying, at aclient device of at least one participant, one or more metrics relatingto at least one of: determined participant engagement, detectednon-verbal cues, identified geographical locations of participants, anddetermined cultural expectations of participants regarding non-verbalcues during the video communication session.

Example 34. The non-transitory computer-readable medium of any ofExamples 18-19, further comprising: instructions for detectingtermination of the video communication session; and instructions forproviding, at a client device of at least one participant, playbackaccess to one or more recorded highlights of the video communicationsession, the highlights are determined based on detected non-verbal cuesassociated with participant engagement.

Example 35: A system comprising one or more processors configured toperform the operations of: displaying, for each of a number ofparticipants within a video communication session, a UI comprising: anumber of participant windows corresponding to the number ofparticipants, and a video for each of at least a subset of theparticipants, wherein the video is displayed within the correspondingparticipant window for the participant; analyzing, in real time, eachvideo to detect a non-verbal cue from a participant; determining thatthe non-verbal cue has been sustained for a duration that exceeds adesignated threshold of time; and displaying, within the UI of at leastone of the participants, a prompt associated with the non-verbal cue.

Example 36. The system of Example 35, wherein analyzing each video todetect the non-verbal cue from the participant comprises: identifying asection of each video corresponding to the eyes of the participant; andanalyzing, in real time, each video to detect sustained eye contact froma participant on a subject.

Example 37. The system of any of Examples 35-36, wherein the promptassociated with the non-verbal cue is displayed to the participant inthe video associated with the non-verbal cue.

Example 38. The system of any of Examples 35-37, wherein the promptdisplayed to the participant comprises one or more of: a recommendationto verbally engage with one or more participants within the videocommunication session, and a recommendation to engage with one or moreUI components indicating an intent to verbally engage.

Example 39. The system of any of Examples 35-38, wherein the promptassociated with the non-verbal cue is displayed to one or more of: ahost participant of the video communication session, and a currentlyspeaking participant of the video communication session.

Example 40. The system of any of Examples 35-39, further comprising: foreach participant for which the prompt is displayed: identifying ageographical location associated with that participant, and determiningone or more cultural expectations of non-verbal cues based on thegeographical location for the participant, the prompt displayed for theparticipant is customized based on the one or more determined culturalexpectations.

Example 41. The system of any of Examples 35-40, wherein the promptdisplayed to the at least one participant comprises a notification thatthe participant in the video associated with the non-verbal cue intendsto verbally engage.

Example 42. The system of any of Examples 35-41, wherein the promptdisplayed to the at least one participant comprises a recommendation forthe at least one participant to prompt the participant associated withthe non-verbal cue for verbal input within the video communicationsession.

Example 43. The system of any of Examples 35-42, further comprising:identifying a geographical location of the participant associated withthe non-verbal cue, the detecting of the non-verbal cue comprisesidentifying a non-verbal cue from a list of non-verbal cues associatedwith the geographical location.

Example 44. The system of any of Examples 35-43, wherein the designatedthreshold of time is modified based on the geographical location of theparticipant associated with the non-verbal cue.

Example 45. The system of any of Examples 35-44, wherein one or moreaspects of detecting non-verbal cue on a subject within the video forone of the participants is performed by an AI algorithm.

Example 46. The system of Example 45, wherein the AI algorithm istrained on a plurality of prior video communication sessions.

Example 47. The system of any of Examples 35-46, further comprising:modifying a preexisting arrangement of the participant windows withinthe UI to place the participant associated with the non-verbal cue in amore prominent or higher position.

Example 48. The system of any of Examples 35-47, further comprising:receiving a behavioral profile for the participant associated with thenon-verbal cue; and determining one or more non-verbal cues associatedwith the behavioral profile for the participant, the prompt associatedwith the non-verbal cue is customized based on the determined non-verbalcues and behavioral profile.

Example 49. The system of any of Examples 35-48, further comprising:receiving behavioral profiles for one or more of: a host participant,and a currently speaking participant; determining one or moreexpectations of non-verbal cues associated with the behavioral profilesfor the participants, the prompt associated with the non-verbal cue iscustomized based on the determined expectations of non-verbal cues andthe behavioral profiles.

Example 50. The system of any of Examples 35-49, further comprising:detecting termination of the video communication session; anddisplaying, at a client device of at least one participant, one or moremetrics relating to at least one of: determined participant engagement,detected non-verbal cues, identified geographical locations ofparticipants, and determined cultural expectations of participantsregarding non-verbal cues during the video communication session.

Example 51. The system of any of Examples 35-50, further comprising:detecting termination of the video communication session; and providing,at a client device of at least one participant, playback access to oneor more recorded highlights of the video communication session, thehighlights are determined based on detected non-verbal cues associatedwith participant engagement.

Some portions of the preceding detailed descriptions have been presentedin terms of algorithms and symbolic representations of operations ondata bits within a computer memory. These algorithmic descriptions andrepresentations are the ways used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of operations leading to adesired result. The operations are those requiring physicalmanipulations of physical quantities. Usually, though not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the above discussion, itis appreciated that throughout the description, discussions utilizingterms such as “identifying” or “determining” or “executing” or“performing” or “collecting” or “creating” or “sending” or the like,refer to the action and processes of a computer system, or similarelectronic computing device, that manipulates and transforms datarepresented as physical (electronic) quantities within the computersystem's registers and memories into other data similarly represented asphysical quantities within the computer system memories or registers orother such information storage devices.

The present disclosure also relates to an apparatus for performing theoperations herein. This apparatus may be specially constructed for theintended purposes, or it may comprise a general purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but not limited to, any type of diskincluding floppy disks, optical disks, CD-ROMs, and magnetic-opticaldisks, read-only memories (ROMs), random access memories (RAMs), EPROMs,EEPROMs, magnetic or optical cards, or any type of media suitable forstoring electronic instructions, each coupled to a computer system bus.

Various general purpose systems may be used with programs in accordancewith the teachings herein, or it may prove convenient to construct amore specialized apparatus to perform the method. The structure for avariety of these systems will appear as set forth in the descriptionabove. In addition, the present disclosure is not described withreference to any particular programming language. It will be appreciatedthat a variety of programming languages may be used to implement theteachings of the disclosure as described herein.

The present disclosure may be provided as a computer program product, orsoftware, that may include a machine-readable medium having storedthereon instructions, which may be used to program a computer system (orother electronic devices) to perform a process according to the presentdisclosure. A machine-readable medium includes any mechanism for storinginformation in a form readable by a machine (e.g., a computer). Forexample, a machine-readable (e.g., computer-readable) medium includes amachine (e.g., a computer) readable storage medium such as a read onlymemory (“ROM”), random access memory (“RAM”), magnetic disk storagemedia, optical storage media, flash memory devices, etc.

In the foregoing disclosure, implementations of the disclosure have beendescribed with reference to specific example implementations thereof. Itwill be evident that various modifications may be made thereto withoutdeparting from the broader spirit and scope of implementations of thedisclosure as set forth in the following claims. The disclosure anddrawings are, accordingly, to be regarded in an illustrative senserather than a restrictive sense.

1. A method, comprising: displaying, for each of a plurality ofparticipants within a video communication session, a user interface (UI)comprising: a plurality of participant windows corresponding to theplurality of participants, and a video for each of at least a subset ofthe participants, wherein the video is displayed within thecorresponding participant window for the participant; analyzing, in realtime, each video to detect a non-verbal cue from a participant;determining that the non-verbal cue has been sustained for a durationthat exceeds a designated threshold of time; identifying at least oneattribute of the participant that corresponds to the video in which thenon-verbal cue was detected; and displaying, within the UI of at leastone of the participants, a prompt associated with the non-verbal cue andthe at least one attribute.
 2. The method of claim 1, wherein analyzingeach video to detect the non-verbal cue from the participant comprises:identifying a section of each video corresponding to the eyes of theparticipant; and analyzing, in real time, each video to detect sustainedeye contact from a participant on a subject.
 3. The method of claim 1,wherein the prompt associated with the non-verbal cue is displayed tothe participant in the video associated with the non-verbal cue.
 4. Themethod of claim 3, wherein the prompt displayed to the participantcomprises one or more of: a recommendation to verbally engage with oneor more participants within the video communication session, and arecommendation to engage with one or more UI components indicating anintent to verbally engage.
 5. The method of claim 1, wherein the promptassociated with the non-verbal cue is displayed to one or more of: ahost participant of the video communication session, and a currentlyspeaking participant of the video communication session.
 6. The methodof claim 1, further comprising: for each participant for which theprompt is displayed: identifying a geographical location associated withthat participant, and determining one or more cultural expectations ofnon-verbal cues based on the geographical location for the participant,the prompt displayed for the participant is customized based on the oneor more determined cultural expectations.
 7. The method of claim 1,wherein the prompt displayed to the at least one participant comprises anotification that the participant in the video associated with thenon-verbal cue intends to verbally engage.
 8. The method of claim 1,wherein the prompt displayed to the at least one participant comprises arecommendation for the at least one participant to prompt theparticipant associated with the non-verbal cue for verbal input withinthe video communication session.
 9. The method of claim 1, furthercomprising: identifying a geographical location of the participantassociated with the non-verbal cue, the detecting of the non-verbal cuecomprises identifying a non-verbal cue from a list of non-verbal cuesassociated with the geographical location, wherein the designatedthreshold of time is modified based on the geographical location of theparticipant associated with the non-verbal cue.
 10. (canceled)
 11. Themethod of claim 1, wherein one or more aspects of detecting non-verbalcue on a subject within the video for one of the participants isperformed by an artificial intelligence (AI) algorithm, wherein the AIalgorithm is trained on a plurality of prior video communicationsessions.
 12. (canceled)
 13. The method of claim 1, further comprising:modifying a preexisting arrangement of the participant windows withinthe UI to place the participant associated with the non-verbal cue in amore prominent or higher position.
 14. The method of claim 1, furthercomprising: receiving a behavioral profile for the participantassociated with the non-verbal cue; and determining one or morenon-verbal cues associated with the behavioral profile for theparticipant, the prompt associated with the non-verbal cue is customizedbased on the determined non-verbal cues and behavioral profile.
 15. Themethod of claim 1, further comprising: receiving behavioral profiles forone or more of: a host participant, and a currently speakingparticipant; determining one or more expectations of non-verbal cuesassociated with the behavioral profiles for the participants, the promptassociated with the non-verbal cue is customized based on the determinedexpectations of non-verbal cues and the behavioral profiles.
 16. Themethod of claim 1, further comprising: detecting termination of thevideo communication session; and displaying, at a client device of atleast one participant, one or more metrics relating to at least one of:determined participant engagement, detected non-verbal cues, identifiedgeographical locations of participants, and determined culturalexpectations of participants regarding non-verbal cues during the videocommunication session.
 17. The method of claim 1, further comprising:detecting termination of the video communication session; and providing,at a client device of at least one participant, playback access to oneor more recorded highlights of the video communication session, thehighlights are determined based on detected non-verbal cues associatedwith participant engagement.
 18. A non-transitory computer-readablemedium containing instructions, comprising: instructions for displaying,for each of a plurality of participants within a video communicationsession, a user interface (UI) comprising: a plurality of participantwindows corresponding to the plurality of participants, and a video foreach of at least a subset of the participants, wherein the video isdisplayed within the corresponding participant window for theparticipant; instructions for analyzing, in real time, each video todetect a non-verbal cue from a participant; determining that thenon-verbal cue has been sustained for a duration that exceeds adesignated threshold of time; instructions for identifying at least oneattribute of the participant that corresponds to the video in which thenon-verbal cue was detected; and instructions for displaying, within theUI of at least one of the participants, a prompt associated with thenon-verbal cue and the at least one attribute
 19. The non-transitorycomputer-readable medium of claim 18, further comprising: for eachparticipant for which the prompt is displayed: instructions foridentifying a geographical location associated with that participant,and instructions for determining one or more cultural expectations ofnon-verbal cues based on the geographical location for the participant,the prompt displayed for the participant being customized based on theone or more determined cultural expectations.
 20. A system comprisingone or more processors configured to perform the operations of:displaying, for each of a plurality of participants within a videocommunication session, a user interface (UI) comprising: a plurality ofparticipant windows corresponding to the plurality of participants, anda video for each of at least a subset of the participants, wherein thevideo is displayed within the corresponding participant window for theparticipant; analyzing, in real time, each video to detect a non-verbalcue from a participant; determining that the non-verbal cue has beensustained for a duration that exceeds a designated threshold of time;identifying at least one attribute of the participant that correspondsto the video in which the non-verbal cue was detected; and displaying,within the UI of at least one of the participants, a prompt associatedwith the non-verbal cue and the at least one attribute
 21. The method ofclaim 1, wherein identifying at least one attribute of the participantthat corresponds to the video in which the non-verbal cue was detectedfurther comprises: modifying the prompt according to the identifiedattribute.
 22. The method of claim 21, wherein identifying at least oneattribute of the participant further comprises: identifying at least oneof: a geographical location attribute and a behavioral attribute of theparticipant associated with the non-verbal cue.